Noncausal Variants Research Articles

The identification of causal BRCA1/2 pathogenic variants (PVs) in epithelial ovarian carcinoma (EOC) aids the selection of patients for genetic counselling and treatment decision-making. Current recommendations therefore stress sequencing of all EOCs, regardless of histotype. Although it is recognised that BRCA1/2 PVs cluster in high-grade serous ovarian carcinomas (HGSOC), this view is largely unsubstantiated by detailed analysis. Here, we aimed to analyse the results of BRCA1/2 tumour sequencing in a centrally revised, consecutive, prospective series including all EOC histotypes. Sequencing of n = 946 EOCs revealed BRCA1/2 PVs in 125 samples (13%), only eight of which were found in non-HGSOC histotypes. Specifically, BRCA1/2 PVs were identified in high-grade endometrioid (3/20; 15%), low-grade endometrioid (1/40; 2.5%), low-grade serous (3/67; 4.5%), and clear cell (1/64; 1.6%) EOCs. No PVs were identified in any mucinous ovarian carcinomas tested. By re-evaluation and using loss of heterozygosity and homologous recombination deficiency analyses, we then assessed: (1) whether the eight 'anomalous' cases were potentially histologically misclassified and (2) whether the identified variants were likely causal in carcinogenesis. The first 'anomalous' non-HGSOC with a BRCA1/2 PV proved to be a misdiagnosed HGSOC. Next, germline BRCA2 variants, found in two p53-abnormal high-grade endometrioid tumours, showed substantial evidence supporting causality. One additional, likely causal variant, found in a p53-wildtype low-grade serous ovarian carcinoma, was of somatic origin. The remaining cases showed retention of the BRCA1/2 wildtype allele, suggestive of non-causal secondary passenger variants. We conclude that likely causal BRCA1/2 variants are present in high-grade endometrioid tumours but are absent from the other EOC histotypes tested. Although the findings require validation, these results seem to justify a transition from universal to histotype-directed sequencing. Furthermore, in-depth functional analysis of tumours harbouring BRCA1/2 variants combined with detailed revision of cancer histotypes can serve as a model in other BRCA1/2-related cancers. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Bulked segregant analysis (BSA) is an efficient and low-cost strategy that is widely used to identify causal genes in segregating populations. BSA-based methods, such as BSA sequencing (Wenger et al., 2010Wenger J.W. Schwartz K. Sherlock G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae.PLoS Genet. 2010; 6: e1000942Crossref PubMed Scopus (130) Google Scholar), bulked segregant RNA sequencing (BSR-seq) (del Viso et al., 2012del Viso F. Bhattacharya D. Kong Y. Gilchrist M.J. Khokha M.K. Exon capture and bulk segregant analysis: rapid discovery of causative mutations using high-throughput sequencing.BMC Genomics. 2012; 13: 649Crossref PubMed Scopus (14) Google Scholar), and MutMap (Abe et al., 2012Abe A. Kosugi S. Yoshida K. Natsume S. Takagi H. Kanzaki H. Matsumura H. Yoshida K. Mitsuoka C. Tamiru M. et al.Genome sequencing reveals agronomically important loci in rice using MutMap.Nat. Biotechnol. 2012; 30: 174-178Crossref PubMed Scopus (668) Google Scholar), are powerful tools that can be used for rapidly discovering genetic markers and gene mapping. Although BSA is increasingly being used in wheat (Triticum aestivum) gene mapping efforts, few user-friendly BSA tools have been developed for researchers lacking a strong bioinformatics background. Here, we developed the web-based BSA platform WheatGmap (https://www.wheatgmap.org), which integrates multiple BSA mapping models and large amounts of public data to accelerate gene cloning and functional research and facilitate resource sharing. WheatGmap contains three sub-databases: Variants, Expression, and Data Share (Figure 1A). The current version of WheatGmap contains more than 3500 next-generation sequencing datasets of hexaploid wheat, including whole-genome sequencing (WGS), whole-exome sequencing (WES), and transcriptome deep-sequencing (RNA-seq) datasets, which were generated from wheat ethyl methanesulfonate mutants, downloaded from public databases, or shared by users. Using the International Wheat Genome Sequencing Consortium (IWGSC) RefSeq v.1.0 as a reference genome (McKenna et al., 2010McKenna A. Hanna M. Banks E. Sivachenko A. Cibulskis K. Kernytsky A. Garimella K. Altshuler D. Gabriel S. Daly M. et al.The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res. 2010; 20: 1297-1303Crossref PubMed Scopus (12764) Google Scholar; IWGSC, 2018IWGSC Shifting the limits in wheat research and breeding using a fully annotated reference genome.Science. 2018; 361: eaar7191Crossref PubMed Scopus (1288) Google Scholar), single-nucleotide polymorphisms (SNPs) or insertions/deletions (InDels) were called by running the data analysis pipeline (https://wheatgmap.org/document/dataAnalysis/), which followed Genome Analysis Toolkit best practices (https://gatk.broadinstitute.org/) (Supplemental Information and Supplemental Figure 1). All the datasets are available in the web-based platform to query, compare, and export variants, in a user-friendly manner. To help researchers use these numerous resources for gene mapping, we have integrated BSA, the Basic Local Alignment Search Tool (BLAST), and tools that facilitate batch gene function queries, batch gene expression queries, and gene enrichment analysis into the platform (Figure 1B). Several BSA models can be implemented in the platform, including SNP index (Abe et al., 2012Abe A. Kosugi S. Yoshida K. Natsume S. Takagi H. Kanzaki H. Matsumura H. Yoshida K. Mitsuoka C. Tamiru M. et al.Genome sequencing reveals agronomically important loci in rice using MutMap.Nat. Biotechnol. 2012; 30: 174-178Crossref PubMed Scopus (668) Google Scholar), Euclidean distance (ED) (Hill et al., 2013Hill J.T. Demarest B.L. Bisgrove B.W. Gorsi B. Su Y.C. Yost H.J. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq.Genome Res. 2013; 23: 687-697Crossref PubMed Scopus (147) Google Scholar), QTLseqr for quantitative trait loci (QTLs) (Mansfeld and Grumet, 2018Mansfeld B.N. Grumet R. QTLseqr: an R Package for bulk segregant analysis with next-generation sequencing.Plant Genome. 2018; 11Crossref PubMed Scopus (60) Google Scholar), and varBScore (Dong et al., 2020Dong C. Zhang L. Chen Z. Xia C. Gu Y. Wang J. Li D. Xie Z. Zhang Q. Zhang X. et al.Combining a new exome capture panel with an effective varBScore algorithm accelerates BSA-based gene cloning in wheat.Front. Plant Sci. 2020; 11: 1249Crossref PubMed Scopus (3) Google Scholar). To perform BSA mapping, sequencing data (i.e., WGS, WES, or RNA-seq data) for at least two contrasting bulk samples from segregating populations should be provided. The populations used for BSA can be either temporary (such as F2 and F3) or permanent (such as recombinant inbred lines [RILs] or doubled haploids [DHs]). Genotyping data from the respective parental lines can be used to filter out non-genetic variants in the segregating populations, significantly improving the accuracy of the mapping results. Furthermore, as thousands of datasets are stored in the database, the user can also select several datasets in the background group as an external reference to help exclude non-causal variants. Next, several key parameters should be set based on the genetic background of the population. “Read Depth” filters out variants with low coverage, “Allele Frequency” removes variants with high heterozygosity (default of 0.36 for F2 populations and 0.3 for other populations), “SNP Window Size and SNP Step Size” can be used to adjust the resolution of the coefficient of linked markers, and “Population Structure and Ref Allele Frequency” is used to apply the QTL sequencing model and filter variants. Default values and an explanation of each parameter can be found using the Help icon close to the parameter input box. Once a job has been submitted via “Mapping via BSA,” the platform automatically generates an SNP density plot, as well as a Manhattan plot showing the varBScore, ED, and SNP index along the genome. The entire process takes several minutes, with processing time depending on the number of samples and type of data provided. A typical gene mapping workflow is shown in Figure 1C. More information about how to perform BSA on WheatGmap is available on the web site. On the “Variants” page, users can search for SNPs or InDels by gene identifier (ID) or target region in the selected samples. Moreover, users can group samples and combine cultivars based on traits or genetic background, and can analyze the underlying genomic variations. The web page returns the SNP frequency associated with each group. The “Gene Information” page integrates information such as chromosome location, functional description, associated Gene Ontology (GO) terms (Ashburner et al., 2000Ashburner M. Ball C.A. Blake J.A. Botstein D. Butler H. Cherry J.M. Davis A.P. Dolinski K. Dwight S.S. Eppig J.T. et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.Nat. Genet. 2000; 25: 25-29Crossref PubMed Scopus (24682) Google Scholar), and Pfam domains (https://pfam.xfam.org) (El-Gebali et al., 2019El-Gebali S. Mistry J. Bateman A. Eddy S.R. Luciani A. Potter S.C. Qureshi M. Richardson L.J. Salazar G.A. Smart A. et al.The Pfam protein families database in 2019.Nucleic Acids Res. 2019; 47: D427-D432Crossref PubMed Scopus (1986) Google Scholar). WheatGmap also provides information concerning homologs from rice (Oryza sativa), Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and barley (Hordeum vulgare). Users can query multiple gene IDs at one time and obtain corresponding functional descriptions. When designing primers or comparing sequences for a candidate gene, users can obtain their sample/bulk sequence from an alternate sequencing tool. Users can also upload a variant call format file and enter the starting genomic position for their gene of interest. WheatGmap returns the alternate sequence based on the reference genome. RNA-seq datasets from public databases were quantified with the IWGSC RefSeq v.1.1 annotated genes and integrated into the WheatGmap gene expression database. Detailed information for these samples, including the tissues of origin, growth conditions, and developmental stages, is also provided. Users can explore the expression profiles of candidate genes by submitting one or multiple gene IDs. An interactive graph is generated after query submission, and users can select the type of data displayed. GO and pathway enrichment analysis is performed by KOBAS 3.0 (Xie et al., 2011Xie C. Mao X. Huang J. Ding Y. Wu J. Dong S. Kong L. Gao G. Li C.Y. Wei L. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases.Nucleic Acids Res. 2011; 39: W316-W322Crossref PubMed Scopus (2287) Google Scholar), with default values (hypergeometric test/Fisher's exact test). Due to the current lack of wheat datasets, the related datasets from Arabidopsis and rice are compared. We use Sequenceserver (https://sequenceserver.com) to power BLAST tools. Both nucleotide and protein sequences can be queried against the wheat (IWGSC RefSeq v.1.0 genome and annotation v.1.1), rice (MSU Rice Genome Annotation Project Release 7), Arabidopsis (TAIR10), maize (B73 RefGen_v4), and barley (Hv_IBSC_PGSB_v2) genomes and proteomes. A typical workflow for BSA performed on WheatGmap is illustrated here with the use of our published data on the wheat yellow-green leaf mutant ygl1 (Dong et al., 2020Dong C. Zhang L. Chen Z. Xia C. Gu Y. Wang J. Li D. Xie Z. Zhang Q. Zhang X. et al.Combining a new exome capture panel with an effective varBScore algorithm accelerates BSA-based gene cloning in wheat.Front. Plant Sci. 2020; 11: 1249Crossref PubMed Scopus (3) Google Scholar). In the “Mapping Via BSA” section, TCE000006 (a green-leaf phenotype) was selected as the wild-type bulk sample, and TCE000005 (a yellow-leaf phenotype) was selected as the mutant bulk sample. The wild-type parent, mutant parent, and background sections were left blank. As the samples were from the segregating BC2F3 population, the genotypes at the target gene were homozygous, the “Allele Frequency” was set to 0.3, and the “Population Structure” was defined as “RIL/DH/F3/Homozygous Plants Bulk” for both bulks derived from homozygous plants. All other parameters were kept as the default. The results window showing SNP density, varBScore, ED, G′, and SNP index for the two bulk samples was automatically generated several minutes after the job was submitted. The candidate region was restricted to ∼671 Mbp on chromosome 7A (Figure 1C), which includes the candidate gene identified in our previous study (Dong et al., 2020Dong C. Zhang L. Chen Z. Xia C. Gu Y. Wang J. Li D. Xie Z. Zhang Q. Zhang X. et al.Combining a new exome capture panel with an effective varBScore algorithm accelerates BSA-based gene cloning in wheat.Front. Plant Sci. 2020; 11: 1249Crossref PubMed Scopus (3) Google Scholar). More examples are provided on the “Gallery” page (https://www.wheatgmap.org/document/gallery/). Suggestions for preparing the populations and bulk sequencing are available on the web site (https://wheatgmap.org/document/materials/). The reference genome (IWGSC RefSeq v.1.0) facilitates the identification of markers near the region harboring a candidate causal gene. The development of new genomic information, such as the pan-genomes or graph-based genomes, will further improve gene mapping, especially for genes in natural populations that are absent from the Chinese Spring reference genome. We encourage researchers to share their published genomic data along with genetic and phenotypic information, since these information will increase the prediction accuracy of the platform and improve access for other wheat researchers by centralizing all the results in one place. The WheatGmap platform will be regularly updated by collecting and incorporating more genomic, phenotypic, and expression data and adding more interactive tools for data visualization. Furthermore, a new computing platform will be introduced in which users will be able to upload raw sequencing data for analysis. With the increased datasets and improved tools, WheatGmap will become a convenient and powerful platform for wheat gene mapping and functional studies.

Noncausal Variants Research Articles

Articles published on Noncausal Variants

PaintorPipe: a pipeline for genetic variant fine-mapping using functional annotations.

Causality and functional relevance of BRCA1 and BRCA2 pathogenic variants in non-high-grade serous ovarian carcinomas.

A robust association test with multiple genetic variants and covariates.

A Bayesian hierarchically structured prior for rare-variant association testing.

A Powerful Adaptive Cauchy-Variable Combination Method for Rare-Variant Association Analysis

WheatGmap: a comprehensive platform for wheat gene mapping and genomic studies

Identifying rare variants for quantitative traits in extreme samples of population via Kullback-Leibler distance

Leveraging allelic imbalance to refine fine-mapping for eQTL studies.

Ancestry-specific association mapping in admixed populations.

Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association.

Genome-wide association studies using a penalized moving-window regression.

An adaptive strategy for association analysis of common or rare variants using entropy theory.

Detecting disease association with rare variants in case-parents studies.

Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level.

Translating Lung Function Genome-Wide Association Study (GWAS) Findings: New Insights for Lung Biology.

Using Population Genetics to Interrogate the Monogenic Nephrotic Syndrome Diagnosis in a Case Cohort.

A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures.

Statistical selection strategy for risk and protective rare variants associated with complex traits.

Assessing the Power of Exome Chips.

Detecting disease association signals with multiple genetic variants and covariates

Lead the way for us